Overview

Dataset statistics

Number of variables29
Number of observations1000000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory221.3 MiB
Average record size in memory232.0 B

Variable types

Categorical15
Text9
Numeric5

Alerts

Claim_Date has constant value "2024-04-24"Constant
Claim_ID has unique valuesUnique
Phone_Number has unique valuesUnique

Reproduction

Analysis started2024-05-30 06:19:19.927054
Analysis finished2024-05-30 06:25:55.232539
Duration6 minutes and 35.31 seconds
Software versionydata-profiling v4.8.3
Download configurationconfig.json

Variables

Provider_ID
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
Eastern Hospital
200827 
Sky Hospital
199948 
Moon Healthcare
199909 
Asian Medical Center
199835 
Sun Clinic
199481 

Length

Max length20
Median length15
Mean length14.602753
Min length10

Characters and Unicode

Total characters14602753
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAsian Medical Center
2nd rowSky Hospital
3rd rowMoon Healthcare
4th rowSky Hospital
5th rowSun Clinic

Common Values

ValueCountFrequency (%)
Eastern Hospital 200827
20.1%
Sky Hospital 199948
20.0%
Moon Healthcare 199909
20.0%
Asian Medical Center 199835
20.0%
Sun Clinic 199481
19.9%

Length

2024-05-30T11:55:55.421385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:55:55.626046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hospital 400775
18.2%
eastern 200827
9.1%
sky 199948
9.1%
moon 199909
9.1%
healthcare 199909
9.1%
asian 199835
9.1%
medical 199835
9.1%
center 199835
9.1%
sun 199481
9.1%
clinic 199481
9.1%

Most occurring characters

ValueCountFrequency (%)
a 1401090
 
9.6%
e 1200150
 
8.2%
1199835
 
8.2%
i 1199407
 
8.2%
n 1199368
 
8.2%
t 1001346
 
6.9%
l 1000000
 
6.8%
s 801437
 
5.5%
o 800593
 
5.5%
H 600684
 
4.1%
Other values (13) 4198843
28.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11203083
76.7%
Uppercase Letter 2199835
 
15.1%
Space Separator 1199835
 
8.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1401090
12.5%
e 1200150
10.7%
i 1199407
10.7%
n 1199368
10.7%
t 1001346
8.9%
l 1000000
8.9%
s 801437
7.2%
o 800593
7.1%
r 600571
5.4%
c 599225
5.3%
Other values (6) 1399896
12.5%
Uppercase Letter
ValueCountFrequency (%)
H 600684
27.3%
M 399744
18.2%
S 399429
18.2%
C 399316
18.2%
E 200827
 
9.1%
A 199835
 
9.1%
Space Separator
ValueCountFrequency (%)
1199835
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 13402918
91.8%
Common 1199835
 
8.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1401090
 
10.5%
e 1200150
 
9.0%
i 1199407
 
8.9%
n 1199368
 
8.9%
t 1001346
 
7.5%
l 1000000
 
7.5%
s 801437
 
6.0%
o 800593
 
6.0%
H 600684
 
4.5%
r 600571
 
4.5%
Other values (12) 3598272
26.8%
Common
ValueCountFrequency (%)
1199835
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14602753
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 1401090
 
9.6%
e 1200150
 
8.2%
1199835
 
8.2%
i 1199407
 
8.2%
n 1199368
 
8.2%
t 1001346
 
6.9%
l 1000000
 
6.8%
s 801437
 
5.5%
o 800593
 
5.5%
H 600684
 
4.1%
Other values (13) 4198843
28.8%

Claim_ID
Text

UNIQUE 

Distinct1000000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
2024-05-30T11:55:56.448113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length12
Mean length11.888896
Min length7

Characters and Unicode

Total characters11888896
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1000000 ?
Unique (%)100.0%

Sample

1st rowCLAIM_1
2nd rowCLAIM_2
3rd rowCLAIM_3
4th rowCLAIM_4
5th rowCLAIM_5
ValueCountFrequency (%)
claim_1 1
 
< 0.1%
claim_32 1
 
< 0.1%
claim_30 1
 
< 0.1%
claim_15 1
 
< 0.1%
claim_3 1
 
< 0.1%
claim_4 1
 
< 0.1%
claim_5 1
 
< 0.1%
claim_6 1
 
< 0.1%
claim_7 1
 
< 0.1%
claim_8 1
 
< 0.1%
Other values (999990) 999990
> 99.9%
2024-05-30T11:55:57.496140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 1000000
 
8.4%
L 1000000
 
8.4%
A 1000000
 
8.4%
I 1000000
 
8.4%
M 1000000
 
8.4%
_ 1000000
 
8.4%
1 600001
 
5.0%
6 600000
 
5.0%
5 600000
 
5.0%
8 600000
 
5.0%
Other values (6) 3488895
29.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5888896
49.5%
Uppercase Letter 5000000
42.1%
Connector Punctuation 1000000
 
8.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 600001
10.2%
6 600000
10.2%
5 600000
10.2%
8 600000
10.2%
2 600000
10.2%
3 600000
10.2%
4 600000
10.2%
7 600000
10.2%
9 600000
10.2%
0 488895
8.3%
Uppercase Letter
ValueCountFrequency (%)
C 1000000
20.0%
L 1000000
20.0%
A 1000000
20.0%
I 1000000
20.0%
M 1000000
20.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1000000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 6888896
57.9%
Latin 5000000
42.1%

Most frequent character per script

Common
ValueCountFrequency (%)
_ 1000000
14.5%
1 600001
8.7%
6 600000
8.7%
5 600000
8.7%
8 600000
8.7%
2 600000
8.7%
3 600000
8.7%
4 600000
8.7%
7 600000
8.7%
9 600000
8.7%
Latin
ValueCountFrequency (%)
C 1000000
20.0%
L 1000000
20.0%
A 1000000
20.0%
I 1000000
20.0%
M 1000000
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11888896
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 1000000
 
8.4%
L 1000000
 
8.4%
A 1000000
 
8.4%
I 1000000
 
8.4%
M 1000000
 
8.4%
_ 1000000
 
8.4%
1 600001
 
5.0%
6 600000
 
5.0%
5 600000
 
5.0%
8 600000
 
5.0%
Other values (6) 3488895
29.3%
Distinct329922
Distinct (%)33.0%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
2024-05-30T11:55:57.763057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length30
Median length28
Mean length13.276224
Min length5

Characters and Unicode

Total characters13276224
Distinct characters54
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique177047 ?
Unique (%)17.7%

Sample

1st rowDarrell Blair
2nd rowWilliam Young
3rd rowKeith Reynolds
4th rowAndre Kelly
5th rowTerry Gonzales
ValueCountFrequency (%)
michael 22935
 
1.1%
smith 21674
 
1.1%
johnson 17238
 
0.8%
james 16825
 
0.8%
david 15947
 
0.8%
jennifer 14751
 
0.7%
john 14270
 
0.7%
williams 13960
 
0.7%
christopher 13958
 
0.7%
thomas 13686
 
0.7%
Other values (1588) 1879531
91.9%
2024-05-30T11:55:58.218051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 1234816
 
9.3%
a 1225484
 
9.2%
1044775
 
7.9%
n 997504
 
7.5%
r 953421
 
7.2%
i 804342
 
6.1%
o 716050
 
5.4%
l 673909
 
5.1%
s 599690
 
4.5%
t 461198
 
3.5%
Other values (44) 4565035
34.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10133023
76.3%
Uppercase Letter 2077116
 
15.6%
Space Separator 1044775
 
7.9%
Other Punctuation 21310
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1234816
12.2%
a 1225484
12.1%
n 997504
9.8%
r 953421
9.4%
i 804342
 
7.9%
o 716050
 
7.1%
l 673909
 
6.7%
s 599690
 
5.9%
t 461198
 
4.6%
h 447234
 
4.4%
Other values (16) 2019375
19.9%
Uppercase Letter
ValueCountFrequency (%)
M 230824
 
11.1%
J 205645
 
9.9%
S 170291
 
8.2%
C 155605
 
7.5%
D 139935
 
6.7%
R 128561
 
6.2%
B 128232
 
6.2%
A 127158
 
6.1%
W 98467
 
4.7%
H 95791
 
4.6%
Other values (16) 596607
28.7%
Space Separator
ValueCountFrequency (%)
1044775
100.0%
Other Punctuation
ValueCountFrequency (%)
. 21310
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12210139
92.0%
Common 1066085
 
8.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1234816
 
10.1%
a 1225484
 
10.0%
n 997504
 
8.2%
r 953421
 
7.8%
i 804342
 
6.6%
o 716050
 
5.9%
l 673909
 
5.5%
s 599690
 
4.9%
t 461198
 
3.8%
h 447234
 
3.7%
Other values (42) 4096491
33.5%
Common
ValueCountFrequency (%)
1044775
98.0%
. 21310
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13276224
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1234816
 
9.3%
a 1225484
 
9.2%
1044775
 
7.9%
n 997504
 
7.5%
r 953421
 
7.2%
i 804342
 
6.1%
o 716050
 
5.4%
l 673909
 
5.1%
s 599690
 
4.5%
t 461198
 
3.5%
Other values (44) 4565035
34.4%
Distinct900
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
2024-05-30T11:55:58.500580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters6000000
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDX_714
2nd rowDX_885
3rd rowDX_988
4th rowDX_779
5th rowDX_644
ValueCountFrequency (%)
dx_715 1197
 
0.1%
dx_285 1195
 
0.1%
dx_838 1195
 
0.1%
dx_514 1192
 
0.1%
dx_508 1190
 
0.1%
dx_666 1190
 
0.1%
dx_421 1186
 
0.1%
dx_667 1185
 
0.1%
dx_566 1185
 
0.1%
dx_848 1180
 
0.1%
Other values (890) 988105
98.8%
2024-05-30T11:55:58.877733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
D 1000000
16.7%
X 1000000
16.7%
_ 1000000
16.7%
3 311682
 
5.2%
8 311624
 
5.2%
4 311624
 
5.2%
6 311340
 
5.2%
1 311331
 
5.2%
5 311188
 
5.2%
9 310735
 
5.2%
Other values (3) 820476
13.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3000000
50.0%
Uppercase Letter 2000000
33.3%
Connector Punctuation 1000000
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 311682
10.4%
8 311624
10.4%
4 311624
10.4%
6 311340
10.4%
1 311331
10.4%
5 311188
10.4%
9 310735
10.4%
2 310636
10.4%
7 310502
10.4%
0 199338
6.6%
Uppercase Letter
ValueCountFrequency (%)
D 1000000
50.0%
X 1000000
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1000000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4000000
66.7%
Latin 2000000
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
_ 1000000
25.0%
3 311682
 
7.8%
8 311624
 
7.8%
4 311624
 
7.8%
6 311340
 
7.8%
1 311331
 
7.8%
5 311188
 
7.8%
9 310735
 
7.8%
2 310636
 
7.8%
7 310502
 
7.8%
Latin
ValueCountFrequency (%)
D 1000000
50.0%
X 1000000
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
D 1000000
16.7%
X 1000000
16.7%
_ 1000000
16.7%
3 311682
 
5.2%
8 311624
 
5.2%
4 311624
 
5.2%
6 311340
 
5.2%
1 311331
 
5.2%
5 311188
 
5.2%
9 310735
 
5.2%
Other values (3) 820476
13.7%
Distinct9000
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
2024-05-30T11:55:59.160672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters9000000
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPROC_2648
2nd rowPROC_9084
3rd rowPROC_9747
4th rowPROC_4334
5th rowPROC_8408
ValueCountFrequency (%)
proc_5757 150
 
< 0.1%
proc_7096 148
 
< 0.1%
proc_3766 148
 
< 0.1%
proc_3818 146
 
< 0.1%
proc_6126 145
 
< 0.1%
proc_4933 145
 
< 0.1%
proc_6065 144
 
< 0.1%
proc_1628 144
 
< 0.1%
proc_4622 144
 
< 0.1%
proc_7294 144
 
< 0.1%
Other values (8990) 998542
99.9%
2024-05-30T11:55:59.568589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
P 1000000
11.1%
R 1000000
11.1%
O 1000000
11.1%
C 1000000
11.1%
_ 1000000
11.1%
7 412180
 
4.6%
1 411808
 
4.6%
9 411421
 
4.6%
5 411343
 
4.6%
4 411120
 
4.6%
Other values (5) 1942128
21.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4000000
44.4%
Decimal Number 4000000
44.4%
Connector Punctuation 1000000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
7 412180
10.3%
1 411808
10.3%
9 411421
10.3%
5 411343
10.3%
4 411120
10.3%
3 410897
10.3%
6 410830
10.3%
2 410730
10.3%
8 410263
10.3%
0 299408
7.5%
Uppercase Letter
ValueCountFrequency (%)
P 1000000
25.0%
R 1000000
25.0%
O 1000000
25.0%
C 1000000
25.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1000000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5000000
55.6%
Latin 4000000
44.4%

Most frequent character per script

Common
ValueCountFrequency (%)
_ 1000000
20.0%
7 412180
8.2%
1 411808
8.2%
9 411421
8.2%
5 411343
8.2%
4 411120
8.2%
3 410897
8.2%
6 410830
8.2%
2 410730
8.2%
8 410263
8.2%
Latin
ValueCountFrequency (%)
P 1000000
25.0%
R 1000000
25.0%
O 1000000
25.0%
C 1000000
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
P 1000000
11.1%
R 1000000
11.1%
O 1000000
11.1%
C 1000000
11.1%
_ 1000000
11.1%
7 412180
 
4.6%
1 411808
 
4.6%
9 411421
 
4.6%
5 411343
 
4.6%
4 411120
 
4.6%
Other values (5) 1942128
21.6%

Claim_Date
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
2024-04-24
1000000 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters10000000
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2024-04-24
2nd row2024-04-24
3rd row2024-04-24
4th row2024-04-24
5th row2024-04-24

Common Values

ValueCountFrequency (%)
2024-04-24 1000000
100.0%

Length

2024-05-30T11:55:59.725527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:55:59.835575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2024-04-24 1000000
100.0%

Most occurring characters

ValueCountFrequency (%)
2 3000000
30.0%
4 3000000
30.0%
0 2000000
20.0%
- 2000000
20.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8000000
80.0%
Dash Punctuation 2000000
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 3000000
37.5%
4 3000000
37.5%
0 2000000
25.0%
Dash Punctuation
ValueCountFrequency (%)
- 2000000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 3000000
30.0%
4 3000000
30.0%
0 2000000
20.0%
- 2000000
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 3000000
30.0%
4 3000000
30.0%
0 2000000
20.0%
- 2000000
20.0%

Admission_Date
Categorical

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
2024-04-15
 
33573
2024-04-02
 
33568
2024-04-04
 
33564
2024-04-17
 
33534
2024-03-31
 
33511
Other values (25)
832250 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters10000000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2024-03-26
2nd row2024-04-07
3rd row2024-04-01
4th row2024-03-31
5th row2024-03-27

Common Values

ValueCountFrequency (%)
2024-04-15 33573
 
3.4%
2024-04-02 33568
 
3.4%
2024-04-04 33564
 
3.4%
2024-04-17 33534
 
3.4%
2024-03-31 33511
 
3.4%
2024-04-09 33498
 
3.3%
2024-04-20 33493
 
3.3%
2024-04-01 33488
 
3.3%
2024-03-28 33473
 
3.3%
2024-03-27 33453
 
3.3%
Other values (20) 664845
66.5%

Length

2024-05-30T11:55:59.961786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2024-04-15 33573
 
3.4%
2024-04-02 33568
 
3.4%
2024-04-04 33564
 
3.4%
2024-04-17 33534
 
3.4%
2024-03-31 33511
 
3.4%
2024-04-09 33498
 
3.3%
2024-04-20 33493
 
3.3%
2024-04-01 33488
 
3.3%
2024-03-28 33473
 
3.3%
2024-03-27 33453
 
3.3%
Other values (20) 664845
66.5%

Most occurring characters

ValueCountFrequency (%)
0 2400609
24.0%
2 2399990
24.0%
- 2000000
20.0%
4 1833573
18.3%
1 466098
 
4.7%
3 399544
 
4.0%
7 100409
 
1.0%
5 100215
 
1.0%
8 99941
 
1.0%
9 99911
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8000000
80.0%
Dash Punctuation 2000000
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2400609
30.0%
2 2399990
30.0%
4 1833573
22.9%
1 466098
 
5.8%
3 399544
 
5.0%
7 100409
 
1.3%
5 100215
 
1.3%
8 99941
 
1.2%
9 99911
 
1.2%
6 99710
 
1.2%
Dash Punctuation
ValueCountFrequency (%)
- 2000000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2400609
24.0%
2 2399990
24.0%
- 2000000
20.0%
4 1833573
18.3%
1 466098
 
4.7%
3 399544
 
4.0%
7 100409
 
1.0%
5 100215
 
1.0%
8 99941
 
1.0%
9 99911
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2400609
24.0%
2 2399990
24.0%
- 2000000
20.0%
4 1833573
18.3%
1 466098
 
4.7%
3 399544
 
4.0%
7 100409
 
1.0%
5 100215
 
1.0%
8 99941
 
1.0%
9 99911
 
1.0%

Discharge_Date
Categorical

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
2024-05-04
 
33631
2024-05-24
 
33623
2024-04-27
 
33620
2024-05-01
 
33515
2024-05-08
 
33493
Other values (25)
832118 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters10000000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2024-05-08
2nd row2024-05-03
3rd row2024-05-24
4th row2024-04-27
5th row2024-05-12

Common Values

ValueCountFrequency (%)
2024-05-04 33631
 
3.4%
2024-05-24 33623
 
3.4%
2024-04-27 33620
 
3.4%
2024-05-01 33515
 
3.4%
2024-05-08 33493
 
3.3%
2024-05-02 33430
 
3.3%
2024-05-18 33424
 
3.3%
2024-05-22 33417
 
3.3%
2024-05-19 33409
 
3.3%
2024-05-09 33384
 
3.3%
Other values (20) 665054
66.5%

Length

2024-05-30T11:56:00.156956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2024-05-04 33631
 
3.4%
2024-05-24 33623
 
3.4%
2024-04-27 33620
 
3.4%
2024-05-01 33515
 
3.4%
2024-05-08 33493
 
3.3%
2024-05-02 33430
 
3.3%
2024-05-18 33424
 
3.3%
2024-05-22 33417
 
3.3%
2024-05-19 33409
 
3.3%
2024-05-09 33384
 
3.3%
Other values (20) 665054
66.5%

Most occurring characters

ValueCountFrequency (%)
2 2433572
24.3%
0 2400020
24.0%
- 2000000
20.0%
4 1300618
13.0%
5 899670
 
9.0%
1 433168
 
4.3%
3 132957
 
1.3%
9 100117
 
1.0%
8 100083
 
1.0%
7 100073
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8000000
80.0%
Dash Punctuation 2000000
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 2433572
30.4%
0 2400020
30.0%
4 1300618
16.3%
5 899670
 
11.2%
1 433168
 
5.4%
3 132957
 
1.7%
9 100117
 
1.3%
8 100083
 
1.3%
7 100073
 
1.3%
6 99722
 
1.2%
Dash Punctuation
ValueCountFrequency (%)
- 2000000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 2433572
24.3%
0 2400020
24.0%
- 2000000
20.0%
4 1300618
13.0%
5 899670
 
9.0%
1 433168
 
4.3%
3 132957
 
1.3%
9 100117
 
1.0%
8 100083
 
1.0%
7 100073
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 2433572
24.3%
0 2400020
24.0%
- 2000000
20.0%
4 1300618
13.0%
5 899670
 
9.0%
1 433168
 
4.3%
3 132957
 
1.3%
9 100117
 
1.0%
8 100083
 
1.0%
7 100073
 
1.0%

Claim_Amount
Real number (ℝ)

Distinct629499
Distinct (%)62.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5049.9249
Minimum100.02
Maximum9999.99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.6 MiB
2024-05-30T11:56:00.369596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100.02
5-th percentile596.46
Q12577.415
median5051.705
Q37522.1925
95-th percentile9505.78
Maximum9999.99
Range9899.97
Interquartile range (IQR)4944.7775

Descriptive statistics

Standard deviation2856.3685
Coefficient of variation (CV)0.56562594
Kurtosis-1.198428
Mean5049.9249
Median Absolute Deviation (MAD)2472.43
Skewness0.00013385167
Sum5.0499249 × 109
Variance8158841.2
MonotonicityNot monotonic
2024-05-30T11:56:00.731860image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8765.06 8
 
< 0.1%
6980.66 8
 
< 0.1%
8646.11 8
 
< 0.1%
3208.51 8
 
< 0.1%
3010.55 8
 
< 0.1%
7155.47 8
 
< 0.1%
9082.22 8
 
< 0.1%
1771.85 8
 
< 0.1%
828.37 7
 
< 0.1%
4566.8 7
 
< 0.1%
Other values (629489) 999922
> 99.9%
ValueCountFrequency (%)
100.02 1
 
< 0.1%
100.03 2
< 0.1%
100.06 1
 
< 0.1%
100.07 2
< 0.1%
100.08 3
< 0.1%
100.09 1
 
< 0.1%
100.11 3
< 0.1%
100.12 1
 
< 0.1%
100.13 1
 
< 0.1%
100.14 1
 
< 0.1%
ValueCountFrequency (%)
9999.99 1
 
< 0.1%
9999.95 1
 
< 0.1%
9999.94 2
< 0.1%
9999.93 1
 
< 0.1%
9999.92 3
< 0.1%
9999.91 2
< 0.1%
9999.89 1
 
< 0.1%
9999.88 2
< 0.1%
9999.85 2
< 0.1%
9999.83 1
 
< 0.1%

Paid_Amount
Real number (ℝ)

Distinct630895
Distinct (%)63.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5023.6087
Minimum50.05
Maximum10000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.6 MiB
2024-05-30T11:56:01.093302image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum50.05
5-th percentile551.4195
Q12538.36
median5023.16
Q37509.83
95-th percentile9501.0705
Maximum10000
Range9949.95
Interquartile range (IQR)4971.47

Descriptive statistics

Standard deviation2870.8348
Coefficient of variation (CV)0.57146863
Kurtosis-1.1993951
Mean5023.6087
Median Absolute Deviation (MAD)2485.845
Skewness0.00083578856
Sum5.0236087 × 109
Variance8241692.4
MonotonicityNot monotonic
2024-05-30T11:56:01.517772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3603.76 9
 
< 0.1%
7229.21 9
 
< 0.1%
1534.1 8
 
< 0.1%
7144.82 8
 
< 0.1%
4222.67 8
 
< 0.1%
3293.1 8
 
< 0.1%
2749.41 8
 
< 0.1%
2415.43 8
 
< 0.1%
7272.34 8
 
< 0.1%
7135.83 8
 
< 0.1%
Other values (630885) 999918
> 99.9%
ValueCountFrequency (%)
50.05 3
< 0.1%
50.06 1
 
< 0.1%
50.08 3
< 0.1%
50.1 1
 
< 0.1%
50.11 4
< 0.1%
50.17 2
< 0.1%
50.18 2
< 0.1%
50.19 1
 
< 0.1%
50.2 3
< 0.1%
50.21 2
< 0.1%
ValueCountFrequency (%)
10000 1
< 0.1%
9999.96 1
< 0.1%
9999.92 2
< 0.1%
9999.91 1
< 0.1%
9999.88 2
< 0.1%
9999.86 1
< 0.1%
9999.85 1
< 0.1%
9999.82 1
< 0.1%
9999.77 2
< 0.1%
9999.76 1
< 0.1%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
Cardiology
334684 
General Medicine
332797 
Orthopedics
332519 

Length

Max length16
Median length11
Mean length12.329301
Min length10

Characters and Unicode

Total characters12329301
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOrthopedics
2nd rowCardiology
3rd rowOrthopedics
4th rowCardiology
5th rowOrthopedics

Common Values

ValueCountFrequency (%)
Cardiology 334684
33.5%
General Medicine 332797
33.3%
Orthopedics 332519
33.3%

Length

2024-05-30T11:56:01.911444image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:56:02.210349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
cardiology 334684
25.1%
general 332797
25.0%
medicine 332797
25.0%
orthopedics 332519
24.9%

Most occurring characters

ValueCountFrequency (%)
e 1663707
13.5%
i 1332797
10.8%
o 1001887
 
8.1%
r 1000000
 
8.1%
d 1000000
 
8.1%
a 667481
 
5.4%
l 667481
 
5.4%
n 665594
 
5.4%
c 665316
 
5.4%
C 334684
 
2.7%
Other values (10) 3330354
27.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10663707
86.5%
Uppercase Letter 1332797
 
10.8%
Space Separator 332797
 
2.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1663707
15.6%
i 1332797
12.5%
o 1001887
9.4%
r 1000000
9.4%
d 1000000
9.4%
a 667481
6.3%
l 667481
6.3%
n 665594
 
6.2%
c 665316
 
6.2%
y 334684
 
3.1%
Other values (5) 1664760
15.6%
Uppercase Letter
ValueCountFrequency (%)
C 334684
25.1%
G 332797
25.0%
M 332797
25.0%
O 332519
24.9%
Space Separator
ValueCountFrequency (%)
332797
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11996504
97.3%
Common 332797
 
2.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1663707
13.9%
i 1332797
11.1%
o 1001887
 
8.4%
r 1000000
 
8.3%
d 1000000
 
8.3%
a 667481
 
5.6%
l 667481
 
5.6%
n 665594
 
5.5%
c 665316
 
5.5%
C 334684
 
2.8%
Other values (9) 2997557
25.0%
Common
ValueCountFrequency (%)
332797
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12329301
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1663707
13.5%
i 1332797
10.8%
o 1001887
 
8.1%
r 1000000
 
8.1%
d 1000000
 
8.1%
a 667481
 
5.4%
l 667481
 
5.4%
n 665594
 
5.4%
c 665316
 
5.4%
C 334684
 
2.7%
Other values (10) 3330354
27.0%

Patient_Age
Real number (ℝ)

Distinct73
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53.988752
Minimum18
Maximum90
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.6 MiB
2024-05-30T11:56:02.524857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile21
Q136
median54
Q372
95-th percentile87
Maximum90
Range72
Interquartile range (IQR)36

Descriptive statistics

Standard deviation21.079818
Coefficient of variation (CV)0.39044834
Kurtosis-1.2022395
Mean53.988752
Median Absolute Deviation (MAD)18
Skewness0.00021125585
Sum53988752
Variance444.35875
MonotonicityNot monotonic
2024-05-30T11:56:02.886029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
27 13970
 
1.4%
23 13962
 
1.4%
71 13939
 
1.4%
22 13874
 
1.4%
39 13843
 
1.4%
77 13832
 
1.4%
67 13826
 
1.4%
78 13818
 
1.4%
53 13818
 
1.4%
32 13817
 
1.4%
Other values (63) 861301
86.1%
ValueCountFrequency (%)
18 13704
1.4%
19 13619
1.4%
20 13744
1.4%
21 13578
1.4%
22 13874
1.4%
23 13962
1.4%
24 13751
1.4%
25 13647
1.4%
26 13608
1.4%
27 13970
1.4%
ValueCountFrequency (%)
90 13664
1.4%
89 13720
1.4%
88 13590
1.4%
87 13679
1.4%
86 13705
1.4%
85 13814
1.4%
84 13554
1.4%
83 13759
1.4%
82 13684
1.4%
81 13781
1.4%

Patient_Gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
Male
500053 
Female
499947 

Length

Max length6
Median length4
Mean length4.999894
Min length4

Characters and Unicode

Total characters4999894
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowFemale
3rd rowMale
4th rowMale
5th rowFemale

Common Values

ValueCountFrequency (%)
Male 500053
50.0%
Female 499947
50.0%

Length

2024-05-30T11:56:03.200355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:56:03.466829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
male 500053
50.0%
female 499947
50.0%

Most occurring characters

ValueCountFrequency (%)
e 1499947
30.0%
a 1000000
20.0%
l 1000000
20.0%
M 500053
 
10.0%
F 499947
 
10.0%
m 499947
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3999894
80.0%
Uppercase Letter 1000000
 
20.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1499947
37.5%
a 1000000
25.0%
l 1000000
25.0%
m 499947
 
12.5%
Uppercase Letter
ValueCountFrequency (%)
M 500053
50.0%
F 499947
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4999894
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1499947
30.0%
a 1000000
20.0%
l 1000000
20.0%
M 500053
 
10.0%
F 499947
 
10.0%
m 499947
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4999894
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1499947
30.0%
a 1000000
20.0%
l 1000000
20.0%
M 500053
 
10.0%
F 499947
 
10.0%
m 499947
 
10.0%

Fraud_Label
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
0
500449 
1
499551 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 500449
50.0%
1 499551
50.0%

Length

2024-05-30T11:56:03.718305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:56:03.906682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 500449
50.0%
1 499551
50.0%

Most occurring characters

ValueCountFrequency (%)
0 500449
50.0%
1 499551
50.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 500449
50.0%
1 499551
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 500449
50.0%
1 499551
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 500449
50.0%
1 499551
50.0%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
Under investigation
333505 
Suspicious
333351 
Cleared
333144 

Length

Max length19
Median length10
Mean length12.002113
Min length7

Characters and Unicode

Total characters12002113
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCleared
2nd rowUnder investigation
3rd rowCleared
4th rowSuspicious
5th rowUnder investigation

Common Values

ValueCountFrequency (%)
Under investigation 333505
33.4%
Suspicious 333351
33.3%
Cleared 333144
33.3%

Length

2024-05-30T11:56:04.111164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:56:04.283514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
under 333505
25.0%
investigation 333505
25.0%
suspicious 333351
25.0%
cleared 333144
25.0%

Most occurring characters

ValueCountFrequency (%)
i 1667217
13.9%
e 1333298
11.1%
n 1000515
 
8.3%
s 1000207
 
8.3%
t 667010
 
5.6%
o 666856
 
5.6%
u 666702
 
5.6%
a 666649
 
5.6%
d 666649
 
5.6%
r 666649
 
5.6%
Other values (9) 3000361
25.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10668608
88.9%
Uppercase Letter 1000000
 
8.3%
Space Separator 333505
 
2.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 1667217
15.6%
e 1333298
12.5%
n 1000515
9.4%
s 1000207
9.4%
t 667010
6.3%
o 666856
 
6.3%
u 666702
 
6.2%
a 666649
 
6.2%
d 666649
 
6.2%
r 666649
 
6.2%
Other values (5) 1666856
15.6%
Uppercase Letter
ValueCountFrequency (%)
U 333505
33.4%
S 333351
33.3%
C 333144
33.3%
Space Separator
ValueCountFrequency (%)
333505
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11668608
97.2%
Common 333505
 
2.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 1667217
14.3%
e 1333298
11.4%
n 1000515
 
8.6%
s 1000207
 
8.6%
t 667010
 
5.7%
o 666856
 
5.7%
u 666702
 
5.7%
a 666649
 
5.7%
d 666649
 
5.7%
r 666649
 
5.7%
Other values (8) 2666856
22.9%
Common
ValueCountFrequency (%)
333505
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12002113
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 1667217
13.9%
e 1333298
11.1%
n 1000515
 
8.3%
s 1000207
 
8.3%
t 667010
 
5.6%
o 666856
 
5.6%
u 666702
 
5.6%
a 666649
 
5.6%
d 666649
 
5.6%
r 666649
 
5.6%
Other values (9) 3000361
25.0%

Policy_Type
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
HMO
500937 
PPO
499063 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3000000
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHMO
2nd rowPPO
3rd rowHMO
4th rowPPO
5th rowPPO

Common Values

ValueCountFrequency (%)
HMO 500937
50.1%
PPO 499063
49.9%

Length

2024-05-30T11:56:04.456007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:56:04.597512image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hmo 500937
50.1%
ppo 499063
49.9%

Most occurring characters

ValueCountFrequency (%)
O 1000000
33.3%
P 998126
33.3%
H 500937
16.7%
M 500937
16.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 3000000
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
O 1000000
33.3%
P 998126
33.3%
H 500937
16.7%
M 500937
16.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 3000000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
O 1000000
33.3%
P 998126
33.3%
H 500937
16.7%
M 500937
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
O 1000000
33.3%
P 998126
33.3%
H 500937
16.7%
M 500937
16.7%

Coverage_Amount
Real number (ℝ)

Distinct367043
Distinct (%)36.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2999.7301
Minimum1000
Maximum5000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.6 MiB
2024-05-30T11:56:04.786026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1000
5-th percentile1200.87
Q12000.05
median2999.905
Q33998.2325
95-th percentile4799.5605
Maximum5000
Range4000
Interquartile range (IQR)1998.1825

Descriptive statistics

Standard deviation1154.2059
Coefficient of variation (CV)0.3847699
Kurtosis-1.1989104
Mean2999.7301
Median Absolute Deviation (MAD)999.125
Skewness1.4091863 × 10-5
Sum2.9997301 × 109
Variance1332191.1
MonotonicityNot monotonic
2024-05-30T11:56:05.052541image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4364.56 13
 
< 0.1%
1624.74 13
 
< 0.1%
4380.77 12
 
< 0.1%
3957.21 12
 
< 0.1%
2456.5 12
 
< 0.1%
1286.24 12
 
< 0.1%
2992.39 12
 
< 0.1%
4464.15 12
 
< 0.1%
1420.82 12
 
< 0.1%
2544.41 11
 
< 0.1%
Other values (367033) 999879
> 99.9%
ValueCountFrequency (%)
1000 1
 
< 0.1%
1000.02 2
< 0.1%
1000.03 2
< 0.1%
1000.05 3
< 0.1%
1000.06 4
< 0.1%
1000.07 2
< 0.1%
1000.08 2
< 0.1%
1000.09 1
 
< 0.1%
1000.11 1
 
< 0.1%
1000.12 4
< 0.1%
ValueCountFrequency (%)
5000 1
 
< 0.1%
4999.98 1
 
< 0.1%
4999.97 1
 
< 0.1%
4999.96 5
< 0.1%
4999.95 2
 
< 0.1%
4999.94 4
< 0.1%
4999.93 1
 
< 0.1%
4999.92 3
< 0.1%
4999.91 5
< 0.1%
4999.9 6
< 0.1%

Total_Charges
Real number (ℝ)

Distinct629457
Distinct (%)62.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5047.0554
Minimum100.03
Maximum9999.99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.6 MiB
2024-05-30T11:56:05.273233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100.03
5-th percentile593.3895
Q12568.77
median5045.89
Q37524.6475
95-th percentile9500.7
Maximum9999.99
Range9899.96
Interquartile range (IQR)4955.8775

Descriptive statistics

Standard deviation2859.1245
Coefficient of variation (CV)0.56649358
Kurtosis-1.202511
Mean5047.0554
Median Absolute Deviation (MAD)2477.84
Skewness0.0014312172
Sum5.0470554 × 109
Variance8174592.8
MonotonicityNot monotonic
2024-05-30T11:56:05.461982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4842.87 9
 
< 0.1%
980.8 8
 
< 0.1%
8468.94 8
 
< 0.1%
1391.03 8
 
< 0.1%
8489.48 8
 
< 0.1%
4695.22 8
 
< 0.1%
5214.51 8
 
< 0.1%
1751.35 7
 
< 0.1%
8575.77 7
 
< 0.1%
6547.37 7
 
< 0.1%
Other values (629447) 999922
> 99.9%
ValueCountFrequency (%)
100.03 1
 
< 0.1%
100.04 2
< 0.1%
100.05 1
 
< 0.1%
100.06 2
< 0.1%
100.08 1
 
< 0.1%
100.09 1
 
< 0.1%
100.1 2
< 0.1%
100.11 2
< 0.1%
100.12 3
< 0.1%
100.13 1
 
< 0.1%
ValueCountFrequency (%)
9999.99 1
< 0.1%
9999.98 1
< 0.1%
9999.95 1
< 0.1%
9999.94 1
< 0.1%
9999.92 1
< 0.1%
9999.91 1
< 0.1%
9999.9 2
< 0.1%
9999.89 1
< 0.1%
9999.88 2
< 0.1%
9999.86 1
< 0.1%

Payment_Type
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
Check
333862 
Electronic Funds Transfer
333191 
Credit Card
332947 

Length

Max length25
Median length11
Mean length13.661502
Min length5

Characters and Unicode

Total characters13661502
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCredit Card
2nd rowCredit Card
3rd rowCredit Card
4th rowCheck
5th rowElectronic Funds Transfer

Common Values

ValueCountFrequency (%)
Check 333862
33.4%
Electronic Funds Transfer 333191
33.3%
Credit Card 332947
33.3%

Length

2024-05-30T11:56:05.650510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:56:05.791965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
check 333862
16.7%
electronic 333191
16.7%
funds 333191
16.7%
transfer 333191
16.7%
credit 332947
16.7%
card 332947
16.7%

Most occurring characters

ValueCountFrequency (%)
r 1665467
12.2%
e 1333191
 
9.8%
c 1000244
 
7.3%
C 999756
 
7.3%
n 999573
 
7.3%
999329
 
7.3%
d 999085
 
7.3%
s 666382
 
4.9%
t 666138
 
4.9%
i 666138
 
4.9%
Other values (10) 3666199
26.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10662844
78.1%
Uppercase Letter 1999329
 
14.6%
Space Separator 999329
 
7.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 1665467
15.6%
e 1333191
12.5%
c 1000244
9.4%
n 999573
9.4%
d 999085
9.4%
s 666382
 
6.2%
t 666138
 
6.2%
i 666138
 
6.2%
a 666138
 
6.2%
k 333862
 
3.1%
Other values (5) 1666626
15.6%
Uppercase Letter
ValueCountFrequency (%)
C 999756
50.0%
E 333191
 
16.7%
F 333191
 
16.7%
T 333191
 
16.7%
Space Separator
ValueCountFrequency (%)
999329
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12662173
92.7%
Common 999329
 
7.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 1665467
13.2%
e 1333191
10.5%
c 1000244
 
7.9%
C 999756
 
7.9%
n 999573
 
7.9%
d 999085
 
7.9%
s 666382
 
5.3%
t 666138
 
5.3%
i 666138
 
5.3%
a 666138
 
5.3%
Other values (9) 3000061
23.7%
Common
ValueCountFrequency (%)
999329
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13661502
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 1665467
12.2%
e 1333191
 
9.8%
c 1000244
 
7.3%
C 999756
 
7.3%
n 999573
 
7.3%
999329
 
7.3%
d 999085
 
7.3%
s 666382
 
4.9%
t 666138
 
4.9%
i 666138
 
4.9%
Other values (10) 3666199
26.8%

State
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
Tokyo
200487 
Mumbai
200109 
Bangkok
199945 
Beijing
199791 
Seoul
199668 

Length

Max length7
Median length6
Mean length5.999581
Min length5

Characters and Unicode

Total characters5999581
Distinct characters17
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBangkok
2nd rowMumbai
3rd rowSeoul
4th rowTokyo
5th rowMumbai

Common Values

ValueCountFrequency (%)
Tokyo 200487
20.0%
Mumbai 200109
20.0%
Bangkok 199945
20.0%
Beijing 199791
20.0%
Seoul 199668
20.0%

Length

2024-05-30T11:56:05.949287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:56:06.184830image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
tokyo 200487
20.0%
mumbai 200109
20.0%
bangkok 199945
20.0%
beijing 199791
20.0%
seoul 199668
20.0%

Most occurring characters

ValueCountFrequency (%)
o 800587
13.3%
k 600377
10.0%
i 599691
10.0%
a 400054
 
6.7%
u 399777
 
6.7%
g 399736
 
6.7%
n 399736
 
6.7%
B 399736
 
6.7%
e 399459
 
6.7%
T 200487
 
3.3%
Other values (7) 1399941
23.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4999581
83.3%
Uppercase Letter 1000000
 
16.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 800587
16.0%
k 600377
12.0%
i 599691
12.0%
a 400054
8.0%
u 399777
8.0%
g 399736
8.0%
n 399736
8.0%
e 399459
8.0%
y 200487
 
4.0%
b 200109
 
4.0%
Other values (3) 599568
12.0%
Uppercase Letter
ValueCountFrequency (%)
B 399736
40.0%
T 200487
20.0%
M 200109
20.0%
S 199668
20.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5999581
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 800587
13.3%
k 600377
10.0%
i 599691
10.0%
a 400054
 
6.7%
u 399777
 
6.7%
g 399736
 
6.7%
n 399736
 
6.7%
B 399736
 
6.7%
e 399459
 
6.7%
T 200487
 
3.3%
Other values (7) 1399941
23.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5999581
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 800587
13.3%
k 600377
10.0%
i 599691
10.0%
a 400054
 
6.7%
u 399777
 
6.7%
g 399736
 
6.7%
n 399736
 
6.7%
B 399736
 
6.7%
e 399459
 
6.7%
T 200487
 
3.3%
Other values (7) 1399941
23.3%

Email
Text

Distinct522063
Distinct (%)52.2%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
2024-05-30T11:56:06.672469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length34
Median length30
Mean length21.825615
Min length15

Characters and Unicode

Total characters21825615
Distinct characters38
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique351695 ?
Unique (%)35.2%

Sample

1st rowcharlenekoch@example.org
2nd rowayersmelanie@example.org
3rd rowmadison17@example.com
4th rowbrittany18@example.org
5th rownharris@example.net
ValueCountFrequency (%)
zsmith@example.com 96
 
< 0.1%
tsmith@example.net 91
 
< 0.1%
ismith@example.org 87
 
< 0.1%
csmith@example.net 86
 
< 0.1%
gsmith@example.org 86
 
< 0.1%
wsmith@example.net 83
 
< 0.1%
psmith@example.org 83
 
< 0.1%
ssmith@example.net 83
 
< 0.1%
dsmith@example.org 82
 
< 0.1%
ysmith@example.org 81
 
< 0.1%
Other values (522053) 999142
99.9%
2024-05-30T11:56:07.363507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 3302750
15.1%
a 2025063
 
9.3%
m 1676717
 
7.7%
l 1577273
 
7.2%
o 1223635
 
5.6%
r 1137302
 
5.2%
p 1129127
 
5.2%
n 1119863
 
5.1%
x 1023327
 
4.7%
@ 1000000
 
4.6%
Other values (28) 6610558
30.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 19325749
88.5%
Other Punctuation 2000000
 
9.2%
Decimal Number 499866
 
2.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 3302750
17.1%
a 2025063
10.5%
m 1676717
 
8.7%
l 1577273
 
8.2%
o 1223635
 
6.3%
r 1137302
 
5.9%
p 1129127
 
5.8%
n 1119863
 
5.8%
x 1023327
 
5.3%
t 757314
 
3.9%
Other values (16) 4353378
22.5%
Decimal Number
ValueCountFrequency (%)
6 50257
10.1%
3 50177
10.0%
1 50166
10.0%
7 50044
10.0%
0 50044
10.0%
5 49991
10.0%
8 49942
10.0%
2 49866
10.0%
9 49761
10.0%
4 49618
9.9%
Other Punctuation
ValueCountFrequency (%)
@ 1000000
50.0%
. 1000000
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 19325749
88.5%
Common 2499866
 
11.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 3302750
17.1%
a 2025063
10.5%
m 1676717
 
8.7%
l 1577273
 
8.2%
o 1223635
 
6.3%
r 1137302
 
5.9%
p 1129127
 
5.8%
n 1119863
 
5.8%
x 1023327
 
5.3%
t 757314
 
3.9%
Other values (16) 4353378
22.5%
Common
ValueCountFrequency (%)
@ 1000000
40.0%
. 1000000
40.0%
6 50257
 
2.0%
3 50177
 
2.0%
1 50166
 
2.0%
7 50044
 
2.0%
0 50044
 
2.0%
5 49991
 
2.0%
8 49942
 
2.0%
2 49866
 
2.0%
Other values (2) 99379
 
4.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21825615
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 3302750
15.1%
a 2025063
 
9.3%
m 1676717
 
7.7%
l 1577273
 
7.2%
o 1223635
 
5.6%
r 1137302
 
5.2%
p 1129127
 
5.2%
n 1119863
 
5.1%
x 1023327
 
4.7%
@ 1000000
 
4.6%
Other values (28) 6610558
30.3%

Phone_Number
Text

UNIQUE 

Distinct1000000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
2024-05-30T11:56:08.117656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length19
Mean length16.164689
Min length10

Characters and Unicode

Total characters16164689
Distinct characters16
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1000000 ?
Unique (%)100.0%

Sample

1st row737.572.4230
2nd row001-284-213-6827x6429
3rd row(320)856-6983
4th row860-217-1502
5th row658.620.1024
ValueCountFrequency (%)
737.572.4230 1
 
< 0.1%
3583554754 1
 
< 0.1%
331.692.3101 1
 
< 0.1%
001-815-565-2083x183 1
 
< 0.1%
320)856-6983 1
 
< 0.1%
860-217-1502 1
 
< 0.1%
658.620.1024 1
 
< 0.1%
001-658-466-2696 1
 
< 0.1%
001-729-848-5510x689 1
 
< 0.1%
712)653-1749x486 1
 
< 0.1%
Other values (999990) 999990
> 99.9%
2024-05-30T11:56:08.950435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 1561496
9.7%
0 1361012
8.4%
1 1360874
8.4%
3 1291978
8.0%
7 1291304
8.0%
9 1291024
8.0%
5 1289568
8.0%
2 1289519
8.0%
6 1289331
8.0%
8 1289199
8.0%
Other values (6) 2849384
17.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 13042315
80.7%
Dash Punctuation 1561496
 
9.7%
Lowercase Letter 600600
 
3.7%
Other Punctuation 398746
 
2.5%
Open Punctuation 200595
 
1.2%
Close Punctuation 200595
 
1.2%
Math Symbol 160342
 
1.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1361012
10.4%
1 1360874
10.4%
3 1291978
9.9%
7 1291304
9.9%
9 1291024
9.9%
5 1289568
9.9%
2 1289519
9.9%
6 1289331
9.9%
8 1289199
9.9%
4 1288506
9.9%
Dash Punctuation
ValueCountFrequency (%)
- 1561496
100.0%
Lowercase Letter
ValueCountFrequency (%)
x 600600
100.0%
Other Punctuation
ValueCountFrequency (%)
. 398746
100.0%
Open Punctuation
ValueCountFrequency (%)
( 200595
100.0%
Close Punctuation
ValueCountFrequency (%)
) 200595
100.0%
Math Symbol
ValueCountFrequency (%)
+ 160342
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15564089
96.3%
Latin 600600
 
3.7%

Most frequent character per script

Common
ValueCountFrequency (%)
- 1561496
10.0%
0 1361012
8.7%
1 1360874
8.7%
3 1291978
8.3%
7 1291304
8.3%
9 1291024
8.3%
5 1289568
8.3%
2 1289519
8.3%
6 1289331
8.3%
8 1289199
8.3%
Other values (5) 2248784
14.4%
Latin
ValueCountFrequency (%)
x 600600
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16164689
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 1561496
9.7%
0 1361012
8.4%
1 1360874
8.4%
3 1291978
8.0%
7 1291304
8.0%
9 1291024
8.0%
5 1289568
8.0%
2 1289519
8.0%
6 1289331
8.0%
8 1289199
8.0%
Other values (6) 2849384
17.6%
Distinct999998
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
2024-05-30T11:56:09.377129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length70
Median length61
Mean length44.709032
Min length20

Characters and Unicode

Total characters44709032
Distinct characters65
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique999996 ?
Unique (%)> 99.9%

Sample

1st row9475 Christine Fort, Riveraview, TX 28683
2nd row012 Martinez Bridge, Popeview, OK 75771
3rd row8544 Roberts Estate Apt. 392, Port Mistyshire, WY 86425
4th row52151 Antonio Hill Suite 655, Lake Christian, NH 49512
5th row7112 Christopher Village Suite 120, North Emily, NJ 46503
ValueCountFrequency (%)
apt 223663
 
3.0%
suite 223393
 
3.0%
port 72202
 
1.0%
box 72052
 
1.0%
lake 67628
 
0.9%
west 64431
 
0.9%
south 64078
 
0.9%
east 63875
 
0.9%
north 63853
 
0.9%
new 63241
 
0.9%
Other values (141157) 6398638
86.7%
2024-05-30T11:56:09.923981image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6377054
 
14.3%
e 2299219
 
5.1%
, 1928113
 
4.3%
a 1855956
 
4.2%
t 1776293
 
4.0%
r 1659075
 
3.7%
i 1462249
 
3.3%
o 1461518
 
3.3%
n 1389740
 
3.1%
s 1192085
 
2.7%
Other values (55) 23307730
52.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 19406357
43.4%
Decimal Number 10485857
23.5%
Space Separator 6377054
 
14.3%
Uppercase Letter 6287988
 
14.1%
Other Punctuation 2151776
 
4.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2299219
11.8%
a 1855956
9.6%
t 1776293
9.2%
r 1659075
 
8.5%
i 1462249
 
7.5%
o 1461518
 
7.5%
n 1389740
 
7.2%
s 1192085
 
6.1%
l 1010170
 
5.2%
h 862985
 
4.4%
Other values (16) 4437067
22.9%
Uppercase Letter
ValueCountFrequency (%)
A 735093
 
11.7%
S 699519
 
11.1%
P 460003
 
7.3%
M 431983
 
6.9%
C 385746
 
6.1%
N 350069
 
5.6%
L 254999
 
4.1%
D 254318
 
4.0%
R 243166
 
3.9%
W 243058
 
3.9%
Other values (16) 2230034
35.5%
Decimal Number
ValueCountFrequency (%)
5 1051646
10.0%
6 1050617
10.0%
8 1050387
10.0%
4 1050125
10.0%
7 1049378
10.0%
3 1049141
10.0%
1 1048331
10.0%
2 1047967
10.0%
9 1047842
10.0%
0 1040423
9.9%
Other Punctuation
ValueCountFrequency (%)
, 1928113
89.6%
. 223663
 
10.4%
Space Separator
ValueCountFrequency (%)
6377054
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 25694345
57.5%
Common 19014687
42.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2299219
 
8.9%
a 1855956
 
7.2%
t 1776293
 
6.9%
r 1659075
 
6.5%
i 1462249
 
5.7%
o 1461518
 
5.7%
n 1389740
 
5.4%
s 1192085
 
4.6%
l 1010170
 
3.9%
h 862985
 
3.4%
Other values (42) 10725055
41.7%
Common
ValueCountFrequency (%)
6377054
33.5%
, 1928113
 
10.1%
5 1051646
 
5.5%
6 1050617
 
5.5%
8 1050387
 
5.5%
4 1050125
 
5.5%
7 1049378
 
5.5%
3 1049141
 
5.5%
1 1048331
 
5.5%
2 1047967
 
5.5%
Other values (3) 2311928
 
12.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 44709032
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6377054
 
14.3%
e 2299219
 
5.1%
, 1928113
 
4.3%
a 1855956
 
4.2%
t 1776293
 
4.0%
r 1659075
 
3.7%
i 1462249
 
3.3%
o 1461518
 
3.3%
n 1389740
 
3.1%
s 1192085
 
2.7%
Other values (55) 23307730
52.1%

Nationality
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
Japanese
201001 
Thai
200063 
Korean
199830 
Chinese
199793 
Indian
199313 

Length

Max length8
Median length7
Mean length6.201669
Min length4

Characters and Unicode

Total characters6201669
Distinct characters15
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowKorean
2nd rowKorean
3rd rowIndian
4th rowThai
5th rowThai

Common Values

ValueCountFrequency (%)
Japanese 201001
20.1%
Thai 200063
20.0%
Korean 199830
20.0%
Chinese 199793
20.0%
Indian 199313
19.9%

Length

2024-05-30T11:56:10.112300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:56:10.285147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
japanese 201001
20.1%
thai 200063
20.0%
korean 199830
20.0%
chinese 199793
20.0%
indian 199313
19.9%

Most occurring characters

ValueCountFrequency (%)
e 1001418
16.1%
a 1001208
16.1%
n 999250
16.1%
i 599169
9.7%
s 400794
6.5%
h 399856
 
6.4%
J 201001
 
3.2%
p 201001
 
3.2%
T 200063
 
3.2%
K 199830
 
3.2%
Other values (5) 998079
16.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5201669
83.9%
Uppercase Letter 1000000
 
16.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1001418
19.3%
a 1001208
19.2%
n 999250
19.2%
i 599169
11.5%
s 400794
7.7%
h 399856
 
7.7%
p 201001
 
3.9%
o 199830
 
3.8%
r 199830
 
3.8%
d 199313
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
J 201001
20.1%
T 200063
20.0%
K 199830
20.0%
C 199793
20.0%
I 199313
19.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 6201669
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1001418
16.1%
a 1001208
16.1%
n 999250
16.1%
i 599169
9.7%
s 400794
6.5%
h 399856
 
6.4%
J 201001
 
3.2%
p 201001
 
3.2%
T 200063
 
3.2%
K 199830
 
3.2%
Other values (5) 998079
16.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6201669
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1001418
16.1%
a 1001208
16.1%
n 999250
16.1%
i 599169
9.7%
s 400794
6.5%
h 399856
 
6.4%
J 201001
 
3.2%
p 201001
 
3.2%
T 200063
 
3.2%
K 199830
 
3.2%
Other values (5) 998079
16.1%
Distinct999457
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
2024-05-30T11:56:11.133777image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters11000000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique998914 ?
Unique (%)99.9%

Sample

1st row317-45-4815
2nd row255-16-5382
3rd row255-07-5680
4th row210-10-2570
5th row044-28-8553
ValueCountFrequency (%)
531-17-4217 2
 
< 0.1%
734-90-6781 2
 
< 0.1%
144-98-1327 2
 
< 0.1%
278-81-0975 2
 
< 0.1%
779-37-8948 2
 
< 0.1%
648-07-1991 2
 
< 0.1%
699-28-0936 2
 
< 0.1%
402-36-5255 2
 
< 0.1%
479-51-5826 2
 
< 0.1%
065-65-8309 2
 
< 0.1%
Other values (999447) 999980
> 99.9%
2024-05-30T11:56:11.920523image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 2000000
18.2%
7 915769
8.3%
4 915162
8.3%
1 914201
8.3%
5 913177
8.3%
6 913082
8.3%
3 913017
8.3%
2 912359
8.3%
8 911622
8.3%
0 888901
8.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9000000
81.8%
Dash Punctuation 2000000
 
18.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
7 915769
10.2%
4 915162
10.2%
1 914201
10.2%
5 913177
10.1%
6 913082
10.1%
3 913017
10.1%
2 912359
10.1%
8 911622
10.1%
0 888901
9.9%
9 802710
8.9%
Dash Punctuation
ValueCountFrequency (%)
- 2000000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 11000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 2000000
18.2%
7 915769
8.3%
4 915162
8.3%
1 914201
8.3%
5 913177
8.3%
6 913082
8.3%
3 913017
8.3%
2 912359
8.3%
8 911622
8.3%
0 888901
8.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 2000000
18.2%
7 915769
8.3%
4 915162
8.3%
1 914201
8.3%
5 913177
8.3%
6 913082
8.3%
3 913017
8.3%
2 912359
8.3%
8 911622
8.3%
0 888901
8.1%
Distinct535712
Distinct (%)53.6%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
2024-05-30T11:56:12.281344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length38
Median length33
Mean length16.536182
Min length5

Characters and Unicode

Total characters16536182
Distinct characters54
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique471426 ?
Unique (%)47.1%

Sample

1st rowMitchell-Mcintosh
2nd rowGalloway, Castillo and Smith
3rd rowPhillips, Bryant and Murphy
4th rowLee, Jackson and Hoffman
5th rowPeterson, Lopez and Blake
ValueCountFrequency (%)
and 389443
 
16.3%
plc 55884
 
2.3%
sons 55868
 
2.3%
ltd 55569
 
2.3%
inc 55517
 
2.3%
llc 55473
 
2.3%
group 55150
 
2.3%
smith 28877
 
1.2%
johnson 22437
 
0.9%
williams 18685
 
0.8%
Other values (199047) 1597151
66.8%
2024-05-30T11:56:12.862949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 1523592
 
9.2%
1390054
 
8.4%
a 1374546
 
8.3%
e 1189066
 
7.2%
r 1075569
 
6.5%
o 1062224
 
6.4%
s 785990
 
4.8%
d 728511
 
4.4%
l 719158
 
4.3%
i 671214
 
4.1%
Other values (44) 6016258
36.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11923300
72.1%
Uppercase Letter 2556289
 
15.5%
Space Separator 1390054
 
8.4%
Other Punctuation 333575
 
2.0%
Dash Punctuation 332964
 
2.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 1523592
12.8%
a 1374546
11.5%
e 1189066
10.0%
r 1075569
9.0%
o 1062224
8.9%
s 785990
 
6.6%
d 728511
 
6.1%
l 719158
 
6.0%
i 671214
 
5.6%
t 493676
 
4.1%
Other values (16) 2299754
19.3%
Uppercase Letter
ValueCountFrequency (%)
L 304438
11.9%
C 262203
10.3%
S 228105
 
8.9%
M 209373
 
8.2%
G 169790
 
6.6%
W 163436
 
6.4%
B 161440
 
6.3%
H 161302
 
6.3%
P 152631
 
6.0%
R 137733
 
5.4%
Other values (15) 605838
23.7%
Space Separator
ValueCountFrequency (%)
1390054
100.0%
Other Punctuation
ValueCountFrequency (%)
, 333575
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 332964
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 14479589
87.6%
Common 2056593
 
12.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 1523592
 
10.5%
a 1374546
 
9.5%
e 1189066
 
8.2%
r 1075569
 
7.4%
o 1062224
 
7.3%
s 785990
 
5.4%
d 728511
 
5.0%
l 719158
 
5.0%
i 671214
 
4.6%
t 493676
 
3.4%
Other values (41) 4856043
33.5%
Common
ValueCountFrequency (%)
1390054
67.6%
, 333575
 
16.2%
- 332964
 
16.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16536182
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 1523592
 
9.2%
1390054
 
8.4%
a 1374546
 
8.3%
e 1189066
 
7.2%
r 1075569
 
6.5%
o 1062224
 
6.4%
s 785990
 
4.8%
d 728511
 
4.4%
l 719158
 
4.3%
i 671214
 
4.1%
Other values (44) 6016258
36.4%

Occupation
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
Businessman
200187 
Engineer
200149 
Teacher
200113 
Artist
199966 
Doctor
199585 

Length

Max length11
Median length8
Mean length7.601346
Min length6

Characters and Unicode

Total characters7601346
Distinct characters18
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTeacher
2nd rowBusinessman
3rd rowTeacher
4th rowDoctor
5th rowEngineer

Common Values

ValueCountFrequency (%)
Businessman 200187
20.0%
Engineer 200149
20.0%
Teacher 200113
20.0%
Artist 199966
20.0%
Doctor 199585
20.0%

Length

2024-05-30T11:56:13.035666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:56:13.192998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
businessman 200187
20.0%
engineer 200149
20.0%
teacher 200113
20.0%
artist 199966
20.0%
doctor 199585
20.0%

Most occurring characters

ValueCountFrequency (%)
e 1000711
13.2%
n 800672
10.5%
s 800527
10.5%
r 799813
10.5%
i 600302
 
7.9%
t 599517
 
7.9%
a 400300
 
5.3%
c 399698
 
5.3%
o 399170
 
5.3%
B 200187
 
2.6%
Other values (8) 1600449
21.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6601346
86.8%
Uppercase Letter 1000000
 
13.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1000711
15.2%
n 800672
12.1%
s 800527
12.1%
r 799813
12.1%
i 600302
9.1%
t 599517
9.1%
a 400300
6.1%
c 399698
 
6.1%
o 399170
 
6.0%
u 200187
 
3.0%
Other values (3) 600449
9.1%
Uppercase Letter
ValueCountFrequency (%)
B 200187
20.0%
E 200149
20.0%
T 200113
20.0%
A 199966
20.0%
D 199585
20.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7601346
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1000711
13.2%
n 800672
10.5%
s 800527
10.5%
r 799813
10.5%
i 600302
 
7.9%
t 599517
 
7.9%
a 400300
 
5.3%
c 399698
 
5.3%
o 399170
 
5.3%
B 200187
 
2.6%
Other values (8) 1600449
21.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7601346
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1000711
13.2%
n 800672
10.5%
s 800527
10.5%
r 799813
10.5%
i 600302
 
7.9%
t 599517
 
7.9%
a 400300
 
5.3%
c 399698
 
5.3%
o 399170
 
5.3%
B 200187
 
2.6%
Other values (8) 1600449
21.1%

Marital_Status
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
Divorced
250387 
Single
250280 
Widowed
249678 
Married
249655 

Length

Max length8
Median length7
Mean length7.000107
Min length6

Characters and Unicode

Total characters7000107
Distinct characters16
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDivorced
2nd rowMarried
3rd rowDivorced
4th rowDivorced
5th rowDivorced

Common Values

ValueCountFrequency (%)
Divorced 250387
25.0%
Single 250280
25.0%
Widowed 249678
25.0%
Married 249655
25.0%

Length

2024-05-30T11:56:13.334473image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:56:13.475811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
divorced 250387
25.0%
single 250280
25.0%
widowed 249678
25.0%
married 249655
25.0%

Most occurring characters

ValueCountFrequency (%)
i 1000000
14.3%
e 1000000
14.3%
d 999398
14.3%
r 749697
10.7%
o 500065
 
7.1%
D 250387
 
3.6%
v 250387
 
3.6%
c 250387
 
3.6%
S 250280
 
3.6%
n 250280
 
3.6%
Other values (6) 1499226
21.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6000107
85.7%
Uppercase Letter 1000000
 
14.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 1000000
16.7%
e 1000000
16.7%
d 999398
16.7%
r 749697
12.5%
o 500065
8.3%
v 250387
 
4.2%
c 250387
 
4.2%
n 250280
 
4.2%
g 250280
 
4.2%
l 250280
 
4.2%
Other values (2) 499333
8.3%
Uppercase Letter
ValueCountFrequency (%)
D 250387
25.0%
S 250280
25.0%
W 249678
25.0%
M 249655
25.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7000107
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 1000000
14.3%
e 1000000
14.3%
d 999398
14.3%
r 749697
10.7%
o 500065
 
7.1%
D 250387
 
3.6%
v 250387
 
3.6%
c 250387
 
3.6%
S 250280
 
3.6%
n 250280
 
3.6%
Other values (6) 1499226
21.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7000107
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 1000000
14.3%
e 1000000
14.3%
d 999398
14.3%
r 749697
10.7%
o 500065
 
7.1%
D 250387
 
3.6%
v 250387
 
3.6%
c 250387
 
3.6%
S 250280
 
3.6%
n 250280
 
3.6%
Other values (6) 1499226
21.4%

Education_Level
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
PhD
250678 
High School
250441 
Bachelor
249763 
Master
249118 

Length

Max length11
Median length8
Mean length6.999697
Min length3

Characters and Unicode

Total characters6999697
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHigh School
2nd rowPhD
3rd rowMaster
4th rowBachelor
5th rowBachelor

Common Values

ValueCountFrequency (%)
PhD 250678
25.1%
High School 250441
25.0%
Bachelor 249763
25.0%
Master 249118
24.9%

Length

2024-05-30T11:56:13.633297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-30T11:56:13.774528image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
phd 250678
20.0%
high 250441
20.0%
school 250441
20.0%
bachelor 249763
20.0%
master 249118
19.9%

Most occurring characters

ValueCountFrequency (%)
h 1001323
14.3%
o 750645
 
10.7%
c 500204
 
7.1%
l 500204
 
7.1%
r 498881
 
7.1%
e 498881
 
7.1%
a 498881
 
7.1%
P 250678
 
3.6%
D 250678
 
3.6%
S 250441
 
3.6%
Other values (8) 1998881
28.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5248137
75.0%
Uppercase Letter 1501119
 
21.4%
Space Separator 250441
 
3.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
h 1001323
19.1%
o 750645
14.3%
c 500204
9.5%
l 500204
9.5%
r 498881
9.5%
e 498881
9.5%
a 498881
9.5%
g 250441
 
4.8%
i 250441
 
4.8%
s 249118
 
4.7%
Uppercase Letter
ValueCountFrequency (%)
P 250678
16.7%
D 250678
16.7%
S 250441
16.7%
H 250441
16.7%
B 249763
16.6%
M 249118
16.6%
Space Separator
ValueCountFrequency (%)
250441
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6749256
96.4%
Common 250441
 
3.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
h 1001323
14.8%
o 750645
11.1%
c 500204
 
7.4%
l 500204
 
7.4%
r 498881
 
7.4%
e 498881
 
7.4%
a 498881
 
7.4%
P 250678
 
3.7%
D 250678
 
3.7%
S 250441
 
3.7%
Other values (7) 1748440
25.9%
Common
ValueCountFrequency (%)
250441
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6999697
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
h 1001323
14.3%
o 750645
 
10.7%
c 500204
 
7.1%
l 500204
 
7.1%
r 498881
 
7.1%
e 498881
 
7.1%
a 498881
 
7.1%
P 250678
 
3.6%
D 250678
 
3.6%
S 250441
 
3.6%
Other values (8) 1998881
28.6%

Interactions

2024-05-30T11:55:43.118786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:37.409191image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:38.854089image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:40.189716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:41.589251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:43.429430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:37.754681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:39.152599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:40.457181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:41.887960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:43.696856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:38.022184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:39.403521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:40.724658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:42.171125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:43.949270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:38.273055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:39.670503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:41.007387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:42.500481image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:44.342094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:38.571358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:39.938143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:41.290586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-30T11:55:42.831684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-30T11:56:13.947658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Admission_DateClaim_AmountCoverage_AmountDischarge_DateEducation_LevelFraud_LabelInvestigation_DetailsMarital_StatusNationalityOccupationPaid_AmountPatient_AgePatient_GenderPayment_TypePolicy_TypeProvider_IDProvider_SpecialtyStateTotal_Charges
Admission_Date1.0000.001-0.0010.0000.0000.0000.0000.0020.0000.000-0.0010.0000.0000.0000.0000.0000.0010.000-0.001
Claim_Amount0.0011.0000.0010.0000.0010.0010.0000.0030.0000.000-0.0010.0000.0000.0020.0000.0000.0000.000-0.000
Coverage_Amount-0.0010.0011.0000.0020.0000.0000.0020.0010.0010.0000.0010.0000.0000.0000.0000.0000.0000.0020.000
Discharge_Date0.0000.0000.0021.0000.0000.0000.0000.0000.0000.0000.0010.0010.0020.0010.0030.0000.0000.0000.001
Education_Level0.0000.0010.0000.0001.0000.0010.0020.0020.0000.002-0.0010.0000.0000.0020.0000.0000.0000.000-0.001
Fraud_Label0.0000.0010.0000.0000.0011.0000.0010.0000.0010.0000.0000.0010.0000.0000.0030.0000.0010.0000.001
Investigation_Details0.0000.0000.0020.0000.0020.0011.0000.0000.0000.000-0.0000.0000.0000.0000.0020.0010.0000.001-0.002
Marital_Status0.0020.0030.0010.0000.0020.0000.0001.0000.0000.0000.0010.0010.0000.0020.0000.0000.0010.0010.000
Nationality0.0000.0000.0010.0000.0000.0010.0000.0001.0000.000-0.0000.0020.0000.0000.0000.0010.0010.000-0.002
Occupation0.0000.0000.0000.0000.0020.0000.0000.0000.0001.0000.0010.0010.0000.0000.0000.0010.0000.0000.000
Paid_Amount-0.001-0.0010.0010.001-0.0010.000-0.0000.001-0.0000.0011.0000.0000.0000.0040.0000.0000.0020.001-0.000
Patient_Age0.0000.0000.0000.0010.0000.0010.0000.0010.0020.0010.0001.0000.0000.0020.0010.0010.0020.001-0.000
Patient_Gender0.0000.0000.0000.0020.0000.0000.0000.0000.0000.0000.0000.0001.0000.0010.0000.0030.0000.001-0.000
Payment_Type0.0000.0020.0000.0010.0020.0000.0000.0020.0000.0000.0040.0020.0011.0000.0000.0000.0010.000-0.001
Policy_Type0.0000.0000.0000.0030.0000.0030.0020.0000.0000.0000.0000.0010.0000.0001.0000.0010.0000.0000.001
Provider_ID0.0000.0000.0000.0000.0000.0000.0010.0000.0010.0010.0000.0010.0030.0000.0011.0000.0000.0010.002
Provider_Specialty0.0010.0000.0000.0000.0000.0010.0000.0010.0010.0000.0020.0020.0000.0010.0000.0001.0000.001-0.001
State0.0000.0000.0020.0000.0000.0000.0010.0010.0000.0000.0010.0010.0010.0000.0000.0010.0011.0000.001
Total_Charges-0.001-0.0000.0000.001-0.0010.001-0.0020.000-0.0020.000-0.000-0.000-0.000-0.0010.0010.002-0.0010.0011.000

Missing values

2024-05-30T11:55:46.356183image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-30T11:55:49.135796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Provider_IDClaim_IDPatient_IDDiagnosis_CodeProcedure_CodeClaim_DateAdmission_DateDischarge_DateClaim_AmountPaid_AmountProvider_SpecialtyPatient_AgePatient_GenderFraud_LabelInvestigation_DetailsPolicy_TypeCoverage_AmountTotal_ChargesPayment_TypeStateEmailPhone_NumberAddressNationalityPassport_NumberEmployerOccupationMarital_StatusEducation_Level
0Asian Medical CenterCLAIM_1Darrell BlairDX_714PROC_26482024-04-242024-03-262024-05-081077.864362.78Orthopedics76Female1ClearedHMO3880.499913.47Credit CardBangkokcharlenekoch@example.org737.572.42309475 Christine Fort, Riveraview, TX 28683Korean317-45-4815Mitchell-McintoshTeacherDivorcedHigh School
1Sky HospitalCLAIM_2William YoungDX_885PROC_90842024-04-242024-04-072024-05-034998.885867.30Cardiology73Female1Under investigationPPO1541.037723.89Credit CardMumbaiayersmelanie@example.org001-284-213-6827x6429012 Martinez Bridge, Popeview, OK 75771Korean255-16-5382Galloway, Castillo and SmithBusinessmanMarriedPhD
2Moon HealthcareCLAIM_3Keith ReynoldsDX_988PROC_97472024-04-242024-04-012024-05-247058.218526.15Orthopedics34Male0ClearedHMO2047.669671.58Credit CardSeoulmadison17@example.com(320)856-69838544 Roberts Estate Apt. 392, Port Mistyshire, WY 86425Indian255-07-5680Phillips, Bryant and MurphyTeacherDivorcedMaster
3Sky HospitalCLAIM_4Andre KellyDX_779PROC_43342024-04-242024-03-312024-04-271628.678317.18Cardiology58Male0SuspiciousPPO3198.927887.55CheckTokyobrittany18@example.org860-217-150252151 Antonio Hill Suite 655, Lake Christian, NH 49512Thai210-10-2570Lee, Jackson and HoffmanDoctorDivorcedBachelor
4Sun ClinicCLAIM_5Terry GonzalesDX_644PROC_84082024-04-242024-03-272024-05-121480.434136.33Orthopedics90Female0Under investigationPPO2935.93332.80Electronic Funds TransferMumbainharris@example.net658.620.10247112 Christopher Village Suite 120, North Emily, NJ 46503Thai044-28-8553Peterson, Lopez and BlakeEngineerDivorcedBachelor
5Moon HealthcareCLAIM_6Tony GordonDX_892PROC_23352024-04-242024-04-012024-05-193870.007623.96Orthopedics37Female0SuspiciousPPO3371.452564.29Electronic Funds TransferTokyojohnallen@example.org001-658-466-2696249 Robert Shoals Apt. 813, East Emily, NJ 06925Indian335-64-4833Odonnell LLCEngineerWidowedPhD
6Moon HealthcareCLAIM_7Ross AtkinsonDX_534PROC_72982024-04-242024-04-062024-05-224343.767886.82General Medicine44Male0Under investigationHMO4608.704185.24Electronic Funds TransferTokyodylanhamilton@example.com001-729-848-5510x68926394 Ashley Estates Suite 630, North Steven, PA 62365Indian289-19-4177Howard-FordEngineerMarriedHigh School
7Asian Medical CenterCLAIM_8Meghan BryantDX_235PROC_82732024-04-242024-04-022024-05-121477.686933.63General Medicine53Male0Under investigationPPO4629.352340.75CheckBeijinghayescatherine@example.org(712)653-1749x486818 Michael Canyon Apt. 314, Paulland, MH 23442Japanese535-78-2674Clark, Oneill and AndersonTeacherWidowedBachelor
8Asian Medical CenterCLAIM_9Melissa PetersenDX_850PROC_54372024-04-242024-04-202024-05-091405.244547.49Cardiology59Male1SuspiciousHMO1267.659384.24Credit CardBeijingdarrell46@example.net(730)907-6852x8417459062 Diana Harbor Apt. 850, West Emilymouth, VT 34613Korean366-96-8210Thompson-HartEngineerDivorcedMaster
9Asian Medical CenterCLAIM_10Diana SchmidtDX_931PROC_10642024-04-242024-04-232024-05-189063.09916.38Orthopedics50Female1ClearedHMO2532.278775.42Credit CardMumbaiparkerfrank@example.com+1-330-568-1956x9767661 Emily Gateway, Bergland, GU 41122Thai088-60-4249Johnston, Mclaughlin and WilliamsonDoctorMarriedBachelor
Provider_IDClaim_IDPatient_IDDiagnosis_CodeProcedure_CodeClaim_DateAdmission_DateDischarge_DateClaim_AmountPaid_AmountProvider_SpecialtyPatient_AgePatient_GenderFraud_LabelInvestigation_DetailsPolicy_TypeCoverage_AmountTotal_ChargesPayment_TypeStateEmailPhone_NumberAddressNationalityPassport_NumberEmployerOccupationMarital_StatusEducation_Level
999990Eastern HospitalCLAIM_999991Lisa ParkerDX_178PROC_73082024-04-242024-04-122024-05-057971.30919.30Orthopedics48Female0Under investigationHMO2103.457022.81Electronic Funds TransferTokyomooresteven@example.org698-699-8032x31344895 Kelly Mission, New Andrea, PA 65056Japanese316-10-2860Cardenas LLCDoctorMarriedPhD
999991Sky HospitalCLAIM_999992Elizabeth RichardsDX_362PROC_10052024-04-242024-03-302024-05-239088.265037.26Cardiology69Male0ClearedPPO3592.409290.50Credit CardBangkokhillapril@example.org+1-345-313-3925x7389273 Victoria Greens Suite 025, Port Jamesside, ID 86274Chinese630-53-3415Marquez-HolmesEngineerMarriedHigh School
999992Sky HospitalCLAIM_999993Margaret CarlsonDX_625PROC_85122024-04-242024-04-212024-05-095412.935131.20Orthopedics25Female1ClearedPPO2414.185344.02Credit CardBeijinghrobbins@example.net847.549.0627119 Prince Valley, Lake Carlosbury, WI 26754Chinese694-61-0166Higgins LtdTeacherDivorcedBachelor
999993Eastern HospitalCLAIM_999994Alexandra SmithDX_526PROC_11442024-04-242024-03-252024-05-041273.093300.31General Medicine47Female0ClearedHMO4431.86319.08CheckBangkoklopezronald@example.com957-801-6698731 Berg Unions Suite 451, Rossside, ME 80067Thai815-39-6192Murillo-MooreDoctorWidowedHigh School
999994Eastern HospitalCLAIM_999995Ms. Mariah BrownDX_776PROC_66432024-04-242024-04-032024-05-06997.5276.86General Medicine69Female1ClearedHMO2896.36949.42Electronic Funds TransferBangkokqshea@example.com(562)500-8789x0672604 Wyatt Junction Suite 541, Patrickville, MI 69528Thai640-71-5243Harper, Wagner and SampsonBusinessmanMarriedHigh School
999995Asian Medical CenterCLAIM_999996Evelyn RiversDX_590PROC_15432024-04-242024-03-252024-05-129576.713128.20Orthopedics85Male0ClearedHMO1505.569891.99CheckSeoulkhughes@example.org405.871.5546129 Kelly Forges, Longland, AZ 46430Thai840-84-6763Ross-MelendezEngineerWidowedBachelor
999996Moon HealthcareCLAIM_999997Robert WoodsDX_954PROC_68702024-04-242024-03-262024-05-094600.569604.02Orthopedics61Male1ClearedPPO4175.443968.83CheckTokyofroberts@example.net569-771-5484x20259281 Judy Crescent Suite 322, North Edwin, DC 52257Japanese318-96-8498Hansen GroupEngineerDivorcedHigh School
999997Sun ClinicCLAIM_999998David ThomasDX_302PROC_94052024-04-242024-03-252024-05-207103.052955.37General Medicine41Male0SuspiciousPPO1329.918615.29Electronic Funds TransferBangkokjameswilliams@example.org(962)996-9863x7621216 Michaela Rapid Suite 198, Williamsmouth, MO 10043Chinese174-92-1906Miller PLCTeacherMarriedBachelor
999998Asian Medical CenterCLAIM_999999Samantha HubbardDX_517PROC_49162024-04-242024-04-052024-05-21313.718483.86Cardiology50Female0ClearedHMO3388.072163.77Electronic Funds TransferBeijingtsmith@example.org423-752-932426176 Joshua Skyway Apt. 043, Riceton, GA 23354Thai203-65-1589Day, Stewart and FrostTeacherSingleMaster
999999Sun ClinicCLAIM_1000000Amy EdwardsDX_699PROC_79142024-04-242024-03-292024-05-081540.964121.44Orthopedics51Male1ClearedHMO2752.813194.72CheckMumbaibarbara46@example.net313-781-4871x689492795 Williams Mills, East Keith, MO 60890Indian562-34-1333Webb-RogersDoctorDivorcedPhD